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Abstract 

Recently, there has been an increasing level of interest in reporting subscores. This paper 
examines the issue of reporting subscores at an aggregate level, especially at the level of 
institutions that the examinees belong to. A series of statistical analyses is suggested to 
determine when subscores at the institutional level have any added value over the total 
scores. The methods are applied to two operational data sets. For the data under study, 
the results provide little support in favor of reporting subscores for either examinees or 
institutions. 
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1. Introduction 


What are subscores and why are they desirable? Educational and psychological 
tests often have different subsections based on content categories or blueprints. For 
example, a test on mathematics knowledge may have subsections on algebra and geometry. 
Similarly, a test of general ability can have subsections on mathematics, reading, and 
writing. Scores assigned to these subsections are commonly known as subscores. Subscores 
resulting from the administration of tests with high-stakes outcomes are desirable for at 
least two important reasons. First, failing candidates want to know their strengths and 
weaknesses in different content areas to plan for future remedial work. Second, states and 
academic institutions such as colleges and universities want a profile of performance for 
their graduates to better evaluate their training and focus on areas that need instructional 
improvement (Haladyna & Kramer, 2004). 

Despite this apparent usefulness of subscores, certain important factors must be 
considered before making a decision on whether to report subscores at either the individual 
or institutional level. Although many tests are designed to cover a broad domain, and the 
total test score is considered to be a composite of different abilities measured by different 
subsections, it is debatable whether a subsection with fewer items than the total test can 
be viewed as a mini-test that can precisely measure a unique ability. 

Haberman (2005) argued that a subscore may be considered useful only when it 
provides a more accurate measure of the construct being measured than is provided by the 
total score. Wainer et al. (2001) suggested that a test used for diagnostic purposes must 
yield scores that are reliable both for the total test and for the subscores associated with 
specific subsections or content areas. Furthermore, to be useful for diagnostic purposes, 
the subscores must focus as closely as possible on the content areas in which the examinee 
may be having difficulty. Finally, Tate (2004) has emphasized the importance of ensuring 
reasonable subscore performance in terms of high reliability and validity to minimize 
incorrect instructional and remediation decisions. 

From the above review, it is apparent that the quality of the subscores must be assessed 
before considering score reporting at the subscore level. It also serves as an important 
reminder of the following: Just as inaccurate information at the total test score level 


1 



can lead to inaccurate pass and fail decisions with damaging consequences to both the 
testing programs and test takers, inaccurate information at the subscore level can also 
lead to incorrect remediation decisions resulting in large and needless expense for state or 
institutions. 

The studies cited above have mainly focused on the use of subscores at the examinee 
level (ignoring any information from institutions or state agencies). However, as mentioned 
earlier, subscores at the institutional level could also be of interest for planning remedial 
and training programs. Moreover, subscores may not offer added value at the examinee level 
but may do so at the institutional level. For example, it is possible that the true subscores 
underlying subtests A and B are perfectly correlated (in which case subscores do not have 
any added value) within each institution in a population of institutions, but the institution 
means may have a lower correlation (in which case subscores may have any added value). 
Therefore it is important to examine the adequacy of subscores at the institutional level. 

Institutional level subscoring can prove to be useful when there is considerable variation 

TM 

in test performance between different institutions. For example, on a typical Praxis test, 
there are several user states and institutions that have examinee populations that may 
differ considerably in terms of the measured ability. Variation in test scores at the state or 
institutional level may justify investigating the use of subscores at these levels. 

This paper performs a thorough analysis to determine when subscores at the 
institutional level have any added value over the total score for the tests concerned. First, 
an individual-level analysis is performed, as in Haberman (2005), that examines whether 
individual-level subscoring is justified. Then, a similar analysis is developed to determine 
whether reporting of test subscores is justified at an institutional level. The approach 
used involves an analysis of proportional reduction in error variance in estimation of true 
institutional subscore means. The basic criterion applied is that the mean subscore for 
examinees from an institution is not worth reporting if the true institutional mean is more 
accurately predicted by the mean total score of examinees from the institution than by 
the mean total subscore of examinees from the institution. All the computations involved 
are quite simple and use popular software programs, so that operational implementation 
is straightforward for the suggested methods. In addition, the methods can be directly 
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applied if score reporting is considered at a different level of aggregation, say by states 
rather than by institutions. 

At the institutional level, the analysis of appropriate reporting practice depends on 
the number of examinees from the institution who take the test under study. Although 
cases certainly can arise in which no evidence exists that reporting of subscores is ever 
appropriate, it is quite common for analysis to reveal that subscores may be usefully 
reported if the number of examinees from the institution is sufficiently large, but reporting 
is inappropriate if the number of examinees in the institution is relatively small. Thus this 
report considers minimum sample-size requirements for reporting of means of subscores of 
examinees from a particular institution. 

Longford (1990) studied the issue of reporting subscores at the college level, performing 
a multilevel variance component analysis on data at the pilot stage of development of a test, 
so that there were issues like voluntary participation of colleges, motivation of students, 
the lack of many colleges, and a model assumption (of normality) that was admitted to 
be “contentious” (p. 111). Our study is different from that of Longford (1990) in three 
basic ways: (a) We perform an analysis using a measure that is very close to the classical 
reliability measure—hence the method is more intuitive, (b) there are no contentious 
model assumptions, and (c) we analyze large operational test data sets from a test with 
high-stakes outcomes that involves a large number of institutions. 

Section 2 describes the methodology involved, and Section 3 discusses the results 
obtained when the methodology is applied to two data sets from a basic skills test with 
high-stakes outcomes belonging to the Praxis series. Discussions and conclusions are 
provided in Section 4. 


2. Methodology and Analysis 

This section describes our step-by-step approach for determining whether, when, and 
how to report institutional-level subscores. We begin with a description of an examinee-level 
analysis in Section 2 to determine if the examinee-level subscores offer any added value over 
the total scores. This section closely follows Haberman (2005). ffowever, the question of 
the usefulness of institutional-level subscores is different from the question of usefulness 
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of the examinee-level subscores. Sections 2.2 and 2.3 describe analyses required at the 
institutional level. 

2.1 Examinee-Lev el Analysis of Mean-Squared Error 

At the examinee level, analysis involves the observed subscore s, the true subscore s t , 
the observed total score x, and the true total score x t . It is assumed that s t , x t , s — s t , 
and x — x t all have positive variances. As usual in classical test theory, s and s t have 
common mean E(s), x and x t have common mean E(x ), and the true scores s t and x t are 
uncorrelated with the errors s — st and x — Xt- For random variables u and v with finite 
means and variances, the expectation of u is E(u), the standard deviation of u is cr(u), 
the variance of u is a 2 (u), the covariance of u and v is c(u,v), the correlation of u and v 
is p(u,v), and the squared correlation of u and v is p 2 (u,v). It is assumed that the true 
subscore s t and true total score x t are not collinear, so that \p(s t ,x t )\ is less than 1. This 
assumption also implies that \p(s,x)\ < 1. 

The following quantities for the examinee-level data are used (ignoring the information 
on the institutions) to determine if the examinee-level subscores have any additional value 
over their total scores: 

1. The reliability p 2 (x t ,x) of the total test score x 

2. The reliability p 2 (s t ,s) of subscore s 

3. The squared correlation p 2 (s t ,x t ) of the true score s t and the true total score x t 

The KR-20 approach is typically employed to estimate the reliabilities of s and x 
(Kuder & Richardson, 1937). The squared correlation p 2 (s t ,x) of the true subscore s t and 
the observed total score x is then given by 

p 2 (s t , x) = p 2 (s t , x t )p 2 (x t , x)- (1) 

For details on computation of p 2 (s t ,x t ), see Haberman (2005). 

In the analysis of Haberman (2005), three basic approaches to prediction of the true 
score St are considered. In the first or trivial approach, s f is predicted by the constant 
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E(s), so that the mean-squared error is u 2 (st). In the second approach, one based on the 
observed subscore s, the linear regression 

s = E(s) + p 2 (s t ,s)[s-£(s)] 

of s t on s predicts s t , and the mean-squared error is cr 2 (st)[l — p 2 (s t ,s)]. In the third 
approach, based on the observed total score x, the linear regression 

s x = E(s ) + p{s t ,x)[a{s t )/(T{x)][x - E(x)] 

of s t on x predicts s t , and the mean-squared error is <r 2 (st)[l — p 2 (s t ,x )]. Relative to 
use of E(s), p 2 (s t ,s ) is the proportional reduction of mean-squared error from use of the 
estimate s based on the observed subscore, while p 2 (s t ,x ) is the proportional reduction 
in mean-squared error from use of the estimate s x based on the observed total score. 
Haberman (2005) argues on the basis of these results that subscores should not be reported 
if p 2 (s t ,s ) is less than p 2 (s t ,x), for the true subscore is better approximated by use of the 
total observed score rather than the observed subscore. 

Haberman (2005) also considers an option of reporting an estimate of the true subscore 
s t based on the linear regression s a of s t on both the observed subscore s and the observed 
total score x. The study of proportional reduction of mean-squared error also requires the 
correlation p(s,x) of the subscore s and the total score x. The regression is 

s a = E(s) + (5[s - E(s)} + 7 [x - E(x)], 

where 

7 = s)t, 

a(x) 

_ p(x t , x)p(s u x t ) - p{s, x)p{s t , s) 

1 -P 2 (s,x) 

and 

f3 = p(s t , s) [p(s t , s) - p(s, x)t] . 

The mean-squared error is then o- 2 (s;;){l — p 2 (s t ,s) — r 2 [l — p 2 (s,a;)]}, so that the 
proportional reduction in mean-squared error relative to E(s) is 

p 2 (s t ,s a ) = p 2 (s t ,s) + r 2 [l -p 2 (s,x)}. 
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Wainer et al. (2001) discusses the idea of augmentation, which means stabilizing the 
subscores by augmenting data from any particular subscore with information obtained 
from the other subscores. One can perform augmentation using the approach of Haberman 
(2005) by considering a linear regression of s t on other observed subscores Uk, 1 < k < r. 
In the most trivial case, r — 1 and u\ = x — s is the total score minus the subscore. Let the 
true score for u k be u kt . Assume that s t is not a linear function of s and Uk, s is not a linear 
function of the u k , and no u 3 is a linear function of the remaining u k . Then one computes 

r 

s u = E(s) +p u [s- E(s)] + - E{u k )}, 

k= 1 


where 


7 k = [o-(s)/a(u k )}p(s t ,s)T k , 


Pu = p{s t , s) 


p(s t ,s ) - y JkP {s,u k ) 


k= 1 


and 


yyp{uj, u k ) - p(s, Uj)p(s, u k )}r k = p(s t , Uj ) - p(s t , s)p(s, u 3 ) 

k =1 

for 1 < j < r. The mean-squared error is 

v 2 {s t ) |l - p 2 {s t ,s) - y T k [p(s t ,u k ) - p(s t ,s)p(s, w fc )]| , 
so that, relative to E(s), the proportional reduction in mean-squared error is 

r 

P 2 (s t ,s u ) = p 2 (s t ,s) + yr k [p(s t ,u k ) - p(s t ,s)p(s,u k )]. 


k= 1 


This generalization appears to offer very little added benefit for the data considered in this 
paper. In the trivial case in which r = 1 and U\ = x — s, s u is the same as s a . 


2.2 Institutional-Lev el Analysis of Mean-Squared Error 

At the institutional level, the analysis of Section 2.2 must be modified by decomposition 
of scores and subscores into institutional and individual components. Thus subscore s 
has the decomposition s = Sj + s e , where .57 (the component for the institution of the 
examinee) is the same for each examinee in that institution and has mean E(s) and variance 
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<t 2 (s/) > 0. The score x has the decomposition x = Xi + x e , where xy (the component for 
the institution of the examinee) is the same for each examinee in that institution and has 
mean E(x) and variance cr 2 (ay-) > 0. The residual examinee subscore s e = s — s/ within 
institution has mean 0, variance a 2 (s e ) > 0, and is uncorrelated with the institutional 
means ,sy and xj. The residual examinee total score x e = x — xj within institution has 
mean 0, variance cr 2 (x e ) > 0, and is uncorrelated with sj and ay. The analysis is not 
directly concerned with the true scores and errors of Section 2.2, but it should be noted 
that, under classical assumptions, s e = (s t — sj) + (s — s t ) and s t — ,sy and s — s t are 
uncorrelated, so that St — sj has mean 0 and variance cr 2 (st) — cr 2 (s — St). In like fashion, 
x e = (x t — ay) + (x — x t ) and x t — xi and x — x t are uncorrelated, so that x t — xj has mean 
0 and variance cr 2 (x t ) — cr 2 (x — x t ). It is assumed that ,sj and xi do not have a correlation 
of 1 or —1. 

If n examinees are observed from a given institution and if s is the average subscore for 
examinees from that institution, then s = sj + s e , where s e is uncorrelated with sj and xj 
and has mean 0 and variance cr 2 (s e )/n. Thus s has variance u 2 (s/) + cr 2 (s e )/n. For the 
institution, the squared correlation of the institutional mean sj and the average s is then 
the reliability 

a 2 (s 7 ) 


p 2 (si,s) = 


( 2 ) 


u 2 (s/) + a 2 (s e )/n 

Similarly, if x is the average total score for examinees from that institution, then 
x = xi + x e , where x e is uncorrelated with .sy and xj and has mean 0 and variance a 2 (x e )/n. 
Thus x has variance cr 2 (xj) + a 2 (x e )/n. The squared correlation of the institutional mean 
X] and the average x is then the reliability 

„v x , x) = _ 

' 1 ’ 1 aHx,)+a 2 (x e )/n' 

Analysis also requires the squared correlation p 2 (si,Xi ) of institutional mean subscore .sy 
and institutional mean score xj. This calculation may be accomplished by multivariate 
analysis of variance, as is shown in Section 2.3. Given this squared correlation, 


P 2 (s!,x) = p 2 (s I} x I )p 2 (x I ,x)- 


( 3 ) 
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Analogous to results in Section 2.2, if E(s) = E(s t ) is used to predict 57, then the 
mean-squared error is cr 2 (sj). If 

si = E(s) + p 2 (s/,s)[s - E(s)], 

the linear regression of .57 on s, is used to predict .sy, then the mean-squared error is 
<t 2 (s/)[1 — p 2 (s/, s)]. If linear regression of s 7 on x is used to predict sj by use of 

s Ix = E(s) + p(s I ,x)[a(s I )/a(x)}[x - E(x)] f 

then the mean-squared error is (t 2 (s/)[ 1 — p 2 (s/, x)]. Relative to use of E(s), the proportional 
reduction of mean-squared error from use of the linear regression based on s is the reliability 
p 2 (s/,s), while the proportional reduction in mean-squared error from use of the linear 
regression based on x is p 2 (s/,x). Thus a basic requirement for reporting an institutional 
subscore is that p 2 (sj,s ) be greater than p 2 (s/,x). Otherwise, the average total score for 
the institution predicts the institutional subscore mean sj better than does the average 
subscore for the institution. For any nontrivial estimation of the institutional mean sy, it is 
clearly best if the number of examinees n for the institution is relatively large. In addition, 
for n sufficiently large, s is a better predictor of sy than is x. The problem in practice is 
to ascertain when the sample size for an institution is large enough for the subscores to be 
worth reporting. 

Combined use of s and x to predict sy is also possible by use of the same argument as 
in Haberman (2005). In this case, the regression of .sy on ,s and x is 

s Ia = E(s) + /?j[s - E(s)] + 7 i[x - E(x)\, 

where 

°{si) ( _s 

ll = - P{Sl,S)T h 

axj 

_ p(xj, x)p(sj, X/) - p(s, x)p(gj, s) 

Tl 1 — p 2 (s, x) 


and 


Pi = p{si,s)\p{s!,s) - p(s,x)n\. 



The mean-squared error is then cr 2 (sj )[1 — p 2 (s/, s ) — rf [1 — p 2 (s, x)], so that the proportional 
reduction in mean-squared error relative to E(s) is 

p 2 (s I ,s Ia ) = p 2 (s 7 ,s) +t 7 2 [1 -p 2 (s,x)]- (4) 


As in the augmentation approach of Wainer et al. (2001), one may consider the 
decomposition u k = ui k + u ek) where sj and ui k are uncorrelated with s e and u ek . 
Given standard assumptions to prevent collinearity of predictors, sj is predicted by the 
institutional means s for s and u k for u k . The predictor sj u is then 

r 

s Iu = E(s ) + Piu[s - E(s)] + y^ 7 Ik[u k - E(u k )], 

k =i 


where 


7 ik — [o’s I /a(u Ik )}T Ik , 


Pin = p(si, s) 


p(s/,s) - T Ik p(s, Uk) 


k =1 


and 


y^\p(uj, u k ) - p(s, Uj)p{s, u k )}r Ik = p(s 7 , uj) - p(s 7 , s)p(s, uj) 

k= 1 

for 1 < j < r. The mean-squared error is 

u 2 (s/) |l -p 2 (s/,s) - y^r 7fc [p(g/,Ufc) - p(s/,s)p(s,iZfc)]| , 
so that, relative to E(s), the proportional reduction in mean-squared error is 


p 2 (s I ,s Iu ) = p 2 (s/,s) + ^Ti k [p(si,u k ) - p(s/,s)p(s,w fc )]. 

fc=i 


2.3 Institutional-Lev el Estimation Procedure 

To estimate the means, variances, and correlations required for an institutional analysis 
requires mean squares and mean cross products customarily associated with a one-way 
multivariate analysis of variance (MANOVA) with dependent variables for the observed 
total score and observed subscore. Let a sample be available with n 3 scores from institution 
j, 1 <3 < J. Let N be the total number of examinees from all institutions. Assume that 
N > J. For examinee i of institution j, let the total score be x l3 , and let the subscore be 
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Sij. Let Xj be the average total score from institution j, and let Sj be the average subscore 
from institution j. Let x. be the mean total score for all examinees, and let s. be the mean 
subscore for all examinees. Let the within-institution mean square for total score be 

J n j 

M xxe — (N — J)~ 1 ^ Xij - xj) 2 , 

3 = 1 i= 1 

let the within-institution mean square for subscore be 

J 

M„e = (jv - jy 1 V £>« - %) 2 . 

j =l *=l 

and let the within-institution mean cross product for subscore and total score be 

J n i 

M sxe = (N — J) 'y ^ y ^ (sjj — Sj){xij — Xj). 

j =i *=i 

Let x, the mean total score for all examinees, be used to estimate E(x ), and let s, the mean 
subscore for all examinees, be used to estimate E(s). Then the between-institution mean 
square for the total score is 

j 

M xxI = (J - l) -i y ^nj{xj - x.) 2 , 

3= 1 

the between-institution mean square for the subscore is 

j 

M ssI = (J - 1) _1 ^ n 3^j ~ ^-) 2 , 

3 = 1 

and the between-institution mean cross product for subscore and total score is 

j 

M sx i (<J 1 ) y ^ Tij (S j s.)(xj x.). 

3=1 

As in Snedecor and Cochran (1989), o r2 (s e ) is normally estimated by d 2 (s e ) = M sse , 
and cr 2 (x e ) is normally estimated by o- 2 (x e ) = M xxe . For the remaining required estimates, 
let n denote the number of examinees from an institution, let 

c = 1 - ^(%/iV) 2 

3=1 
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measure dispersion of examinees across institutions (Gini, 1912), and let 

K = NC/(J- 1). 

Note that C > (J — 1)/J, with equality only if all n 3 are equal, and K > N/J , with equality 
only if all rij are equal. Then <j 2 (s/) has estimate 

a\ Sl ) = K~\M ssI - M sse ), 

cr 2 (xi) has estimate 

d 2 (x/) = K~ 1 (M xxI - M xxe ), 

cr 2 (s) has estimate 

d 2 (s) = d 2 (s/) + cr 2 (s e )/n, 

a 2 {x ) has estimate 

<J 2 (x) = d 2 (x/) + a 2 (x e )/n , 
the covariance c(s e ,x e ) of s e and x e has estimate 

c(s e , X^ Adgxe, 

the covariance c(si,xi ) of sj and Xj has estimate 

c(s/,x/) = K~ 1 (M sxI - M sxe ) 1 
the covariance c(s, x) of s and x has estimate 

c(s, x) = c(s/, x 7 ) + c(s e , x e )/n, 
the covariance c(sj, x) of .sq and x has estimate 

c(s/,x) = c(s/,xj), 

p 2 (sj,s) has estimate 

p 2 (s 7 ,s) = <7 2 (s/)/<t 2 (s), 

p 2 (x/,x) has estimate 

p 2 (x/,x) = <j 2 (x/)/cr 2 (x), 
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p(si,xi ) has estimate 

p(sj,ay) = c(sy,ay)/[<7(s 7 )d(ay)], 

p(s, x) has estimate 

p(s,x) = c(s,x)/[a(s)a(x)], 

p(si,x ) has estimate 

p(si,x) = c ( sj , x )/[< 7 ( s /)< t ( x )], 

and r/ has estimate 

, _ p(xj, x)p(gj, xj) - p(s, ff)p(s 7 , s) 
l-p 2 (s,x) 

Results for augmentation are derived by very similar arguments, so that details are omitted. 

Some changes in procedure are necessary in special cases. The following simple rules 
appear adequate in practice, although the approach of Bock and Peterson (1975) is worth 
consideration even if the ideal condition that all rij are equal does not hold. If M ss j < M ssei 
then no evidence exists that s 7 has a positive variance, so that s 7 , ,sy x , s 7a , and ,sy u are 
all approximated by s., and all proportional reductions in mean-squared error may be 
approximated by 0. If M ss j > M sse but M xx j < M xxe , then no evidence exists that xj 
has a positive variance, so that sj x is approximated by s., and si a and .sy have the same 
approximation. Thus the estimated proportional reduction in mean-squared error for Sj x 
is estimated by 0, and the proportional reduction in mean-squared error for sy and s j„ are 
estimated to be the same. If M ss j > M sse and M xxJ > M xxe but 

(M sxI - M sxe ) 2 > (. M ssI - M sse )(M xxI - M xxe ), 

so that the normal estimate of p 2 (s/,a;/) is greater than or equal to 1, then no evidence 
exists that sj is not a linear function of ay. In this instance, .sy a and .sy x are estimated to 
be the same, c(si,xj ) is set to &(si)a(xi), and p 2 (s/,s/ a ) and p 2 (s/, x) are set equal to 
p 2 (a;/, x). 

Required computations may be performed with the help of the SAS NESTED and 
GLM procedures (SAS Institute, 1996). 
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3. Results 


The subscore analyses from Section 2 were applied to two administrations (forms) of 
a basic skills test belonging to the PRAXIS series. This test is designed for prospective 
and practicing paraprofessionals (i.e., teacher’s aides) and measures skills and knowledge in 
reading, mathematics, and writing, as well as the ability to apply those skills and knowledge 
to aid in classroom instruction. Results were initially considered for six different subscores, 
namely theory and application of mathematics, theory and application of reading, and 
theory and application of writing. Although the results for six subscores were of primary 
interest, we further examined the results for the case with the three subscores for writing, 
mathematics, and reading that were obtained by pooling the theory and application 
portions of each of the three content areas. A final analysis pooled the reading and writing 
parts into one verbal subscore and retained the mathematics subscore. For each of the data 
sets, about a fourth of the examinees did not report their institutions. As a consequence, 
these examinees were removed from the analysis. The precise effect of this omission cannot 
be readily determined. Even after removing these examinees, the number of examinees for 
the two test forms were 3,240 and 2,497, respectively. The respective number of institutions 
were 712 and 654. The number of students rij in an institution j ranged from 1 to 160 in 
these data, with the median size being 2 for both test forms, the 75th percentile being 4 
for both test forms, the 95th percentile being 16 and 14 for the respective test forms, and 
the 99th percentiles being 41 and 28 for the respective test forms. Given these numbers, 
the number of institutions for which any score reports are possible is clearly quite limited 
unless reports combine more than one administration 

3.1 Examinee-Lev el Analysis of Mean-Squared Error 

For the total score, the reliability of both the test forms was 0.94. The first two 
rows of Tables 1, 2, and 3 show the estimates of subscore reliability p 2 (s t ,s) and the 
proportional reduction p 2 (s t ,x ) given by (1), both from individual-level analysis and 
expressed as percentages. The values indicate that the correlation of the true subscores 
are substantially higher with the observed total score than with the observed subscores, so 
that individual-level subscores should not be reported. In addition, the eigenvalues were 
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computed from the 6x6 estimated correlation matrix of the individual subscores. Figure 
1 shows the corresponding scree plots (Cattell, 1956) for the two test forms. The figure 
strongly suggests that a single composite score exists such that each subscore can be very 
well approximated by use of a linear transformation of the composite score. 

The results should not come as a big surprise as other studies also found subscores to 
have little added value. For instance, Harris and Hanson (1991) found subscores to have 
little added value for the English and mathematics tests from the P-ACT+ examination, 
and Haberman (2005) found subscores to have little added value for the SAT® I verbal and 
mathematics examinations. 

3.2 Institutional Analysis 

At the institutional level, results were obtained for numbers n of examinees per 
institution of 30, 100, and 150. Because the maximum number of students in an institution 
is 160, the upper bound of 150 appeared reasonable for the application. Tables 1, 2, and 
3 show the proportional reductions in mean-squared error for these values of n for six 
subscores, three subscores, and two subscores, respectively. 

The last nine rows of the table show, for n = 30,100, and 150, the values of the 
institutional level proportional reductions (expressed as percentages) discussed earlier and 
given by (3), (2), and (4), respectively. 

Figure 2 compares the proportional reductions of mean-squared error at the institutional 
level for observed means of total scores and for observed means of subscores for six subscores, 
three subscores, and two subscores for each of the two test forms. 

The tables and the figure reveal the following: 

• On several occasions, M ss j > M sse , M xx j > M xxe , and 

(M sxI — M sxe ) 2 > ( M ssI — M sse )(M xxI — M xxe ), 

so that p 2 (sj,sj a ) and p 2 (sj,x ) are set equal to p 2 (xj,x). This indicates rather small 
between-institution variation. 

• The criterion of mean-squared error consistently favors prediction of institutional sub¬ 
score means by observed institutional total score means rather than by observed in- 
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Figure 2. Comparison of proportional reduction of mean-squared error of institutional- 
level observed total scores and institutional-level observed subscores. 

Note. In any plot, the three solid lines show the proportional reduction in mean squared error of 
institutional-level observed total scores, a lower line indicating smaller institution size, and the 
three dashed lines show the proportional reduction of mean-squared error of institutional-level 
observed subscores, a lower line indicating smaller institution size. 
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Table 1. 

Percent Reduction (100 X Proportional Reduction) 
in Mean-Squared Error With Six Subscores 



n 


Test form 1 



Test form 2 


1 

2 

Subscore 

3 4 

5 

6 

1 

2 

Subscore 

3 4 

5 

6 

P 2 (st,s) 


77 

71 

77 

73 

75 

74 

78 

75 

79 

58 

76 

75 

p 2 (s t ,x ) 


84 

91 

83 

88 

81 

81 

88 

91 

86 

83 

83 

83 

P 2 {si,x) 

30 

92“ 

92“ 

91 

92“ 

92“ 

92“ 

90“ 

90 

89 

90 

90“ 

90 


100 

98“ 

98“ 

96 

98“ 

98“ 

98“ 

97“ 

97 

96 

97 

97“ 

97 


150 

98“ 

98“ 

97 

98“ 

98“ 

98“ 

98“ 

98 

97 

98 

98“ 

98 

/5 2 (s/,s) 

30 

89 

89 

89 

89 

87 

84 

88 

86 

89 

86 

77 

85 


100 

97 

97 

97 

96 

96 

95 

96 

95 

96 

95 

93 

95 


150 

98 

98 

98 

98 

97 

96 

97 

97 

98 

97 

94 

97 

p 2 (si,s Ia ) 

30 

92“ 

92“ 

91 

92“ 

92“ 

92“ 

90“ 

90 

90 

90 

90“ 

90 


100 

98“ 

98“ 

97 

98“ 

98“ 

98“ 

97“ 

97 

97 

97 

97“ 

97 


150 

98“ 

98“ 

98 

98“ 

98“ 

98“ 

98“ 

98 

98 

98 

98“ 

99 


“For the corresponding subscore M ss j > M sse , M xx j > M xxe and ( M sx j — M sxe ) 2 > 
(M ssJ - M sse )(M xxI - M xxe ) so that p 2 (s/,s/ a ) and p 2 (s/,x) are set equal to ,S 2 (x/,x). 


stitutional subscore means. The observed institutional subscore means come close to 
be favored only for two subscores and at least 100 examinees, as can be observed from 
Table 3. Again, this result is not a big surprise as Longford (1990) also found subscores 
to have of little added value for one of the tests considered. 

• Use of both observed subscore mean and observed total score mean generally provides 
only relatively small gains over use of observed subscores and hardly any gain over use 
of observed total scores. 

• Results vary appreciably from form to form. 

• Although reporting institutional subscore means has little justification in the prepon¬ 
derance of cases, reporting such means does not necessarily lead to poor estimates, for 
the reliability at the institutional level is generally high. 

The results for augmented subscores Sju s are not provided, primarily because s/ n ’s result 
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Table 2. 

Percent Reduction (100 X Proportional Reduction) 
in Mean-Squared Error With Three Subscores 



n 

Test form 1 

Test form 2 

1 

Subscore 

2 

3 

1 

Subscore 

2 3 

P 2 (s t ,s ) 


85.3 

85.5 

84.1 

86.5 

83.7 

85.2 

P 2 (s t ,x) 


87.3 

85.9 

86.9 

89.5 

85.4 

86.8 

p 2 {si,x) 

30 

92.3“ 

92.3“ 

92.3“ 

90.1“ 

89.1 

90.1“ 


100 

97.6“ 

97.6“ 

97.6“ 

96.8“ 

95.8 

96.8“ 


150 

98.4“ 

98.4“ 

98.4“ 

97.8“ 

96.8 

97.8“ 

P 2 (s/,s) 

30 

91.0 

90.7 

88.6 

88.9 

89.8 

83.7 


100 

97.1 

97.0 

96.3 

96.4 

96.7 

94.5 


150 

98.1 

98.0 

97.5 

97.6 

97.8 

96.2 

P 2 (si,SIo ) 

30 

92.3“ 

92.3“ 

92.3“ 

90.1“ 

90.3 

90.1“ 


100 

97.6“ 

97.6“ 

97.6“ 

96.8“ 

96.8 

96.8“ 


150 

98.4“ 

98.4“ 

98.4“ 

97.8“ 

97.8 

97.8“ 


“For the corresponding subscore M ss j > M sse , M xx j > M xxe and 
( M sx i — M sxe ) 2 > (M ss i — M sse )(M xx j — M xxe ) so that 
P 2 {sj,si a ) and p 2 (si,x) are set equal to p 2 (xj,x). 


in hardly any added benefit for the data. 

3.3 Multivariate Analysis of Variance 

A basic difficulty that is encountered with these data can be explored by canonical 
analysis for a one-way multivariate analysis of variance (MANOVA) on the six subscores 
(Bock, 1975, chapter 6). As in the discussions of augmentation, let the subscores be denoted 
by Uk for 1 < k < r — 6. Let C/ be the institutional covariance matrix. 1 with row k and 
column k' equal to c(uik,uik>), and let C e be the error covariance matrix with row k and 
column k' equal to c[uk — uik,Uk> — uik'). Consider the system of relative eigenvalues and 
normalized relative eigenvectors such that 

C/Vfc = AfcC e Vfc 
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Table 3. 

Percent Reduction (100 X Proportional Reduction) 
in Mean-Squared Error With Two Subscores 


n Test form 1 Test form 2 


Subscore Subscore 

12 12 


P 2 (s t ,s) 


91.2 

85.5 

92.0 

83.7 

P 2 {s t ,x ) 


91.4 

85.9 

92.1 

85.4 

p 2 {si,x) 

30 

92.3 a 

92.3“ 

89.8 

89.1 


100 

97.6 a 

97.6“ 

96.5 

95.8 


150 

98.4 a 

98.4“ 

97.5 

96.8 

P 2 (si,s ) 

30 

91.5 

90.7 

88.4 

89.8 


100 

97.3 

97.0 

96.2 

96.7 


150 

98.2 

98.0 

97.5 

97.8 

P 2 (s/,s/a) 

30 

92.3 a 

92.3“ 

89.9 

90.3 


100 

97.6 a 

97.6“ 

96.5 

96.8 


150 

98.4 a 

98.4“ 

97.6 

97.8 


“ For the corresponding subscore, M ss j > M sse , M xx j > M xxe 
and (M sxI - M sxe ) 2 > (. M ssI - M sse )(M xxI - M xxe ) so that 
p 2 (sj,sia) and p 2 (sj,x) are set equal to p 2 (xj,x). 


for 1 < k < r, Afc > A^.+i for k < r, v' k C e Vk = 1 for 1 < k < r, and v' k C e Vk> = 0 for k ^ k'. 
For n examinees from an institution, the maximum possible value of p 2 (dj,d) for a linear 
combination d of the uk with institutional mean d,j is Ai/(Ai + 1 /n). This maximum is 
achieved if d — vju for u with coordinates Uk for 1 < k < r. Thus, in terms of institutional 
reliability, d can be regarded as the optimal linear combination of subscores. For a linear 
combination / of the Uk with institutional mean fi such that — an d d e — d — di 

are uncorrelated, p 2 (fi,f ) cannot exceed + 1/n). The upper bound is achieved for 

/ = v' 2 u. Thus, in terms of institutional reliability, / may be termed the second optimal 
linear combination of subscores because / is the optimal linear combination of subscores 
subject to the constraint that f e and d e are uncorrelated. 

To estimate A k and v fc for the required values of k, the canonical analysis from a 
one-way MANOVA may be used. For examinee i of institution j, let the subscore value for 
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Uk be Uijk-, and let Ujk be the average subscore up. from institution j. Let u.k be the mean 
subscore uk for all examinees, let M e be the r by r within-institution matrix of mean cross 
products with row k and element k! equal to 

J nj 

Mkk'e {N J) ^ ^ ^ ^ i'U'ijk lAjk) id^ijk' ^ j jkJ ) i 

j =1 i= 1 

and let My be the between-institution matrix of mean cross products with row k and 
column k! equal to 

j 

Mkk'i {.J l) ^ ^ Uj (ujk 
3 = 1 

Let the fcth largest relative eigenvalue of M/ relative to M e be kk, and let the corresponding 
relative eigenvector be vy,. Let A k = K~ 1 (fik — 1). Then the estimate of the maximum 
possible p 2 (dj,d) is Ai/(Ai + 1/n), and the corresponding estimate of the maximum possible 
value of p 2 (/y, /) is A 2 /(A 2 + 1/n). 

Results for the two test forms are summarized in Table 4. 


Table 4. 

Percent Reduction (400 X Proportional Reduction) in Mean-Squared Error 
With Total Score, Optimal Linear Combination of Subscores, 
and Optimal Second Linear Combination 


n 

P 2 {xi,x ) 

Test form 1 

P 2 {di,d) 

P 2 {fij) 

P 2 {xi,x ) 

Test form 2 
P 2 {di,d) 

P 2 {fi,f) 

30 

0.923 

0.925 

0.427 

0.901 

0.910 

0.505 

100 

0.976 

0.976 

0.713 

0.968 

0.971 

0.773 

150 

0.984 

0.984 

0.788 

0.978 

0.981 

0.834 


Results for the optimal linear combination are virtually the same as results for the total 
score x. For the case of the linear combination /, reliability is not very satisfactory without 
an n greater than 100, and the reliability is much lower than for x. Thus a fundamental 
problem is that very little information appears available that is not provided by the total 
score. 
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4. Discussion and Conclusion 

This paper demonstrates that reporting subscores can be quite different at an 
institutional level than at an individual level even though the basic arguments are quite 
similar. Few studies explore this issue for operational tests, with the exception of Longford 
(1990), who analyzed data from the pilot stage of development of a test. Our suggested 
analyses can be performed with output from standard statistical software and does not 
involve difficult computations, so that routine use of the proposed methodology is quite 
straightforward. 

In the example under study, reporting examinee means on subscores does not appear 
to be justified for any realistic institution size, although reporting mean total scores for an 
institution does not appear to be problematic even for the smallest sample-size condition 
(30) examined. The results suggest that any possible use of subscores is most likely to 
succeed with more aggregated subscores and large institutions. 

The methods used in this report can be directly applied to score reporting at a different 
type of aggregation, say states rather than institutions. It is also a straightforward matter 
to extend the approach to a hierarchy of aggregations, say institutions within states. 

Another issue with reporting subscores for institutions is that equating and/or scaling 
for subscores is essential if information from more than a single form is to be used to 
characterize results for an institution. Although such information can be available in survey 
assessments such as NAEP, in typical cases that involve tests designed for assessment of 
individuals rather than groups, equating is available for the total score but not for subscores 
(for example, if an anchor test is used to equate the total test, only a few of the items 
will correspond to a particular subscore so that an anchor test equating of the subscore is 
not feasible). No proof exists that scaling is feasible in a particular application, and the 
possibility exists that scaling that may be rather adequate for an individual is far from 
satisfactory if applied to an institution, for the correlation structure for institutional means 
may be quite different than the correlation structure for individual results. It should also 
be emphasized that an application of scaling of subscores to the total score conceptually 
requires the subscore to measure the same construct as the total score, in which case there 
is no point reporting a subscore. Further, a subscore will typically involve too little data 
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for accurate and precise scaling. 

In the case of large institutions, it is prudent to perform outlier analysis to detect 
unusual distributions of subscores or total scores. Such analysis is quite distinct from any 
outlier analysis performed at an individual level. This is a possible area for future research. 

The combined estimate based on both the subscore mean and the total score mean is a 
reasonable candidate for some applications. However, the estimate did not help much, at 
least in the example under study. Further the estimate is not easy to explain to institutional 
users. 

Analysis has been based on linear methods, so it is possible that other methods of 
analysis might yield different results. 
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Notes 

1 Note that the between-institution variance matrix obtained from the one-way MANOVA, 
which is an estimate of Cj, has one large eigenvalue and a few negative eigenvalues for the 
six-subscore case and three-subscore case for both the test forms, which is some proof that 
most of the between-institution variance lies in the total score and not in the subscores. 
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